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CHAPTER  1 


INTRODUCTION 

A  number  of  problems  in  source  coding  deal  with  a  pair  of  correlated 
discrete  memoryless  sources  and  two  separate  non-cooperating  encoders  as 
in  Fig-  1.  Among  these  are  the  Slepian-Wolf  problem  [1],  the  side- 
information  problem  [2],  [3],  and  the  Wyner-Ziv  problem  [4].  The  general 
goal  of  these  problems  is  to  determine  rate  pairs  (R^,R^)  which  allow  the 
outputs  of  each  of  two  sources  to  be  reproduced  at  the  decoder  with  some 
specified  distortion.  The  rate  R^  is  the  rate  of  transmission  of  informa¬ 
tion  from  the  i-th  source  encoder  to  the  decoder.  If  the  joint  distribution 
p,  where  p(u,v)  =  P(X^  -  u,  ®  v) ,  is  completely  known  to  both 

encoders,  then  solutions  to  these  problems  are  known.  Here  we  consider 
these  problems  assuming  only  that  the  joint  distribution  is  known  to  be  in 
some  class  A. 

For  problems  with  only  a  single  source,  universal  codes  are  source 
codes  which  achieve  some  performance  measure  (e.g.,  the  entropy  or 
the  rate-distortion  function)  asymptotically  for  all  distributions  in  some 
class  [12].  For  the  Slepian-Wolf  and  side-information  problems,  where 
decoding  with  arbitrarily  low  distortion  is  required,  universal  coding  is 
not  possible  in  general.  This  can  be  seen  as  follows.  Let  the  marginal 
distribution  for  be  denoted  by  p^ .  The  rate  pair  for  any  code  depends 

only  on  the  marginal  distributions.  Thus,  for  a  given  code  only  a  single 
rate  pair  (R^jR^)  is  possible  for  any  set  of  joint  distributions  which 

Note  that  since  Xv  }  and  Xv  '  (the  random  variables  representing  the 
source  outputs)  are  discrete,  all  of  the  distributions  here  are 
probability  mass  functions. 


Figure  1.  Encoder-decoder  configuration 


have  the  same  marginal  distributions  and  p^.  The  encoders  cannot 
distinguish  between  different  joint  distributions  which  have  the  same 
marginals,  since  each  encoder  observes  the  output  of  only  one  of  the  two 
sources.  So  if  the  true  joint  distribution  is  tt  then  all  p  €  A  such  that 
p^  =  tt^,  (i  *  1,2)  remain  as  possible  distributions  as  far  as  the  source 
encoders  are  concerned  even  if  the  marginals  tt^  and  tt^  are  known  exactly. 
However,  a  single  rate  pair  (R^jR^)  must  be  used  for  all  of  them.  Therefore 
universal  coding  is  possible  only  if  those  p  €  A  with  the  same  marginals 
also  have  the  same  achievable  rate  regions. 

For  the  Wyner-Ziv  problem  the  same  reasoning  shows  that  a  single  rate 
pair  must  correspond  to  each  set  of  p  €  A  with  a  given  pair  (P^>P2) 
marginal  distributions.  However,  a  code  may  still  be  universal  if  it 
achieves  the  optimum  distortion  for  each  such  p.  Universal  fixed-rate 
codes  for  single  sources  do  exist  if  some  positive  distortion  is  allowed 
[13]. 

If  a  code  for  a  class  of  sources  achieves  a  point  (R,D)  on  the  rate 
distortion  curve  for  one  source  in  the  class  and  yields  rate  not  greater 
than  R  and  distortion  not  greater  than  D  for  all  other  sources  in  the 
class,  it  will  be  called  a  robust  code  [14].  Here  we  show  that  robust 
codes  do  exist  for  the  Slepian-Wolf  and  side-information  problems.  For 
the  Wyner-Ziv  problem  an  example  is  given  showing  that  robust  coding  is 
not  possible  in  general  (so  universal  coding  is  not  possible  either). 
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Suppose  that  the  true  distribution  is  given  by  p.  Then  the  i-th 

encoder  can  only  reduce  A  to  a  subset  A^(p^)  »  [tt  €  A:  ■  p^},  which  is 

the  set  of  distributions  with  the  same  i-th  marginal  as  p.  In  the  cases 

where  robust  coding  is  possible,  optimal  performance  will  be  achieved  by 

having  the  i-th  encoder  estimate  p^  and  then  do  robust  coding  for  A^(p^). 

The  sets  A^(p^)  and  A^(p2>  different>  so  tlie  choice  of  rates  for 

this  coding  is  not  simple  in  general.  In  the  side -information  p  ‘blera 

only  A^(p^)  is  used,  and  hence  a  coding  technique  which  gives  o  tnal 

performance  is  easily  obtained.  This  is  not  the  case  with  the  S  'ian- 

Wolf  problem.  A  non -computable  characterization  of  the  set  of  able 

rate  pairs  for  this  problem  is  given  in  [7]  and  can  be  described  as 

follows.  Let  IP ^  and  be  the  spaces  of  possible  marginals  for  and 

(2) 

X  respectively.  A  rate  pair  (R^,R^)  is  achievable  for  a  given 
if  there  exist  functions  f^:  -*■  [O,00)  and  f^:  *"*  [0,®)  such  that 

R^  =  and  also  such  that  for  all  p  €  A  the  rate  pair  (f ^(P^) > ^ (?2^ ^ 

is  in  the  Slepian-Wolf  region  for  the  pair  of  sources  with  joint  distribu¬ 
tion  p.  In  Chapter  6  an  upper  bound  to  the  set  of  achievable  rate  pairs 
is  given,  and  a  number  of  such  functions  f^  and  f ^  which  yield  sets  of 
achievable  rate  pairs  are  considered. 

By  way  of  introduction  a  special  case  of  the  Slepian-Wolf  problem 
(which  is  also  a  special  case  of  the  side-information  problem)  is  considered 
in  Chapter  3.  The  robust  coding  result  for  the  side-information  case  is 
derived  in  Chapter  4.  Chapter  5  concerns  the  Wyner-Ziv  problem.  Here  an 
example  is  given  which  proves  that  robust  coding  is  not  possible  in  general, 
and  a  special  case  is  presented  where  robust  coding  is  possible. 
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CHAPTER  2 
PRELIMINARIES 

An  encoder  may  make  an  estimate  of  its  marginal  distribution  (which  is 
with  high  probability  within  a  prescribed  accuracy)  by  observing  the 
relative  frequency  of  source  letter  outputs  during  some  initial  estimation 
time.  Similarly  if  the  encoders  send  information  at  rate  R^  equal  to  the 
entropy  of  the  i-th  source  (i  -  1,2)  for  some  initial  time,  the  decoder 
can  estimate  the  joint  distribution.  The  arbitrarily  small  uncertainties 
in  these  estimates  have  no  effect  on  the  achievable  rate  regions  if  the 
class  A  is  assumed  to  be  closed.  Hence,  in  the  body  of  the  thesis  it  is 
assumed  that  the  marginal  distribution  for  the  i-th  source  is  known  to  i-th 
encoder  (i  =  1,2)  and  the  joint  distribution  is  known  to  the  decoder.  In 
addition  it  is  assumed  that  for  some  a  >  0,  n(u,v)  ^  a  for  all 
(u,v)  €  X  and  all  tt  €  A.  The  results  needed  to  prove  the 

coding  theorems  without  these  assumptions  are  derived  in  Appendix  A. 

The  alphabet  for  the  i-th  source  is  denoted  by  X ^  which  is  a  finite 
set  with  1%^!  elements.  The  output  of  the  i-th  source  at  time  k  is  a  random 
variable  denoted  with  marginal  distribution  P^;  i.e.,  P{x£**^*  u]  =  p^(u) 

for  u  €  The  random  variables  (x£^\x£^)  have  joint  distribution  p 

and  are  independent  of  (xf^,Xj^)  for  j  ^  k.  Define  to  be  a  sequence 

of  n  outputs  from  the  i-th  source  {x£^  ,  .  .  .  ,X^"  ^  ]  ,  and  let  be  the  set  of 
all  n-sequences  whose  components  are  elements  of  A  blocklength  n  code 

for  the  i-th  source  is  a  function  -*  {l,2 , . . .  ,jj  g^j| }  .  The  rate  of  this 


code  is  given  by 


Ri  ■  ^  l°8  IM  • 
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(Base  two  logarithms  are  used  throughout.)  The  decoder  is  defined  by  a 
function 


f  :  £l»  —  .IlgJI }  x  U,...,||g2||}  x  X* 

A 

where  X ^  is  the  reproduction  alphabet  for  the  i-th  source.  Thus  the  decoder 
output  which  corresponds  to  the  source  output  vectors  (X^,X^)  is 


(x(1>,x<2>)  » 


£[81(X<1>).82(X<2>)] 


The  distortion  achieved  by  a  code  is 

E[d(X(l),X(l))}  £  EU"1  E  d.  (X^.X^)}  , 

j=l  1  J  J 

(i)  *(i) 

the  expected  per  letter  distortion  between  X  and  X  computed  using  a 

A 

single  letter  distortion  measure  d  (•,•)  defined  on  X ^  X  X^ .  A  rate  pair 
(R^,R^)  will  be  called  achievable  if  for  all  positive  e  and  6,  codes  of  rates 
R!  <  +  €  exist  for  which  E(d(X^\x^^)}  <  +  6  where  is  the  specified 

distortion  level  for  the  i-th  source.  The  set  of  all  achievable  rate  pairs 
will  be  called  the  achievable  rate  region. 

Let  x(i)  denote  the  n-vector  [x^\ . . .  ,x^^ 3  where  xj^  €  2^  for  1  ^  j  ^  n, 
and  define  N(u;x^^)  to  be  the  number  of  occurrances  of  the  letter  u  in  the 
sequence  •  A  sequence  x^^  is  called  6-typical  if 

Jn*-1N(u;x^i^)-pi(u)  \  <  (1) 

for  all  u  €  X^,  6  >0.  The  set  of  6-typical  sequences  will  be  denoted 

Tn(6,Pi)*  The  set  Tn(^>P^)  ^as  the  following  properties  which  are  proved  in 
[5]  and  [111. 

Property  1.  For  any  6  >  0,  P[X^  €  T^(6  >P^)1  -  1  as  n  -  ® 


uniformly  on  A. 
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Property  2.  exp2I>(K(pi)-c)l  <  |ln(6pi)|  <  exp^nOCCp^)  +  c) ] 
where 

c  *  -  6  log  a  ^  ^  -  5  \X . I  2  log  p. (u) 

1  u6Zt 

and 

K(Pi)  3  -  2  Pt(u)log  pt(u)  . 

u  € 

(Here  exp2(a)  »  2a  for  any  real  number  a.) 

Similarly  define  N(u,v;x^\x^^)  to  be  the  number  of  indices  j  such 
that  (u jV)  *  (Xj^,Xj^),  1  <  j  <  n,  the  being  the  components  of  x^  • 

Then  call  x^^  and  jointly  6 -typical  if 

|n  *N(u,v;x^,x^)  -  p(u,v)|  <  6|%jJ  11%2|  1  (2) 

for  all  (u,v)  €  X^  x  X0  where  p  is  the  joint  distribution  of  and  . 

The  set  of  jointly  6 -typical  (x^\x^2^)  will  be  denoted  1^(6, p).  Note 
that 

(x(1),x(2))  €  J  (6,p)  «*  X(i)  €  T  (6,p .)  . 

—  —  n  —  n  i 

If  the  pair  of  random  vectors  is  considered  a  single 

vector  of  independent  random  variables  {(X^  ^ ^ >X^  )} 

each  with  distribution  p,  then  the  defining  inequality  (1)  for  6 -typical  is 

equivalent  to  inequality  (2).  Thus  directly  from  the  properties  of 

T  (6  ,  p ,  )  we  have 
n  i 


[ 

i 
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P[(X(1),X(2))  €  Jn(6,p)]  -  1  as  n-« 

and 


exp2[n0C(p)-c)]  <  |jn(6,p)|  <  exp2 [n (K(p)+c) ] 

where  JC(p)  is  the  joint  entropy  of  X^  and  X^2\ 

For  the  Slepian-Wolf  and  side  information  problems  we  define  a  robust 
code  as  follows.  Let  &(tt)  denote  the  set  of  boundary  points  of  the 
achievable  rate  region  for  the  case  in  which  the  joint  distribution  tt  is 
known  to  both  encoders  and  let  R^(n,p)  be  the  rate  of  a  code  of  block- 
length  n  for  the  i-th  encoder  when  the  source  distribution  is  p.  A 
sequence  of  codes  of  increasing  blocklength  n  for  a  class  of  sources  A  is 
robust  if  the  codes  achieve  zero  distortion  uniformly  as  n  ®  for  all 
sources  in  A  and  if  for  some  tt€A  the  rate  pairs  (R^  (n,p)  (n,p) )  (which 

may  be  a  function  of  p  the  true  source  distribution)  satisfy 

<  i=l,2, 

for  all  p€A  and  N,  where  0  as  and  is  independent  of  p  and 

•fc  "k 

(R^  S &(rr) .  So  a  sequence  of  codes  is  robust  if  it  achieves  asymptotically 
zero  distortion  for  all  sources  in  the  class. 

To  modify  this  definition  for  the  Wyner-Ziv  problem  let  R^(D)  be  the 
Wyner-Ziv  rate  distortion  function  for  source  tt.  Then  call  a  sequence  of 

blocklength  n  codes  robust  if  the  rate  R(n,rr)  and  distortion  D(n,rr)  con- 

*  A 
verge  to  a  point  on  R^(D)  f°r  some  source  tt  c  A  and  if  R^  (n,rr)  £  (n,Tr) 

and  D (n,rr)  £  D (n,n)  for  all  tt€A. 
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CHAPTER  3 

ROBUST  CODING  FOR  THE  CORNER  POINT  OF  THE  SLEPIAN-WOLF  REGION 
3 . 1  Statement  of  Problem  and  Preliminary  Result 

The  Slepian-Wolf  problem  is  to  determine  the  set  of  rate  pairs  (R^.R^) 
which  allow  the  outputs  of  both  sources  to  be  reproduced  at  the  decoder  with 

A 

arbitrarily  low  distortion.  In  this  problem  2^  3  Z^y  and  the  distortion 
measure  is  the  Hamming  distortion  measure  (i.e.,  d^(u,u)  =  0,  d^(u,v)  *  1 
if  u  ^  v),  so  arbitrarily  low  probability  of  decoding  error  corresponds  to 
arbitrarily  low  distortion.  Here  we  derive  a  robust  coding  result  for  the 
special  case  where  >^(?2)-  Since  >  5f(p^)  the  decoder  may  be  assumed 

/  O  \ 

to  know  ']  exactly,  and  the  problem  becomes  the  determination  of  the  set 

of  rates  which  allow  [x^^3  to  be  recovered  with  arbitrarily  low  distor¬ 

tion.  This  is  the  corner  point  of  the  Slepian-Wolf  region  (see  Fig.  2). 
Notice  that  an  increase  in  above  3C(p2)  does  not  permit  a  decrease  in  R^. 

The  joint  distribution  is  known  to  be  in  some  class  A.  In  addition  the 
encoder  knows  the  marginal  p^,  and  the  decoder  knows  the  true  joint  distribu¬ 
tion  p.  However,  the  encoder  does  not  know  the  joint  distribution  p  nor  the 
marginal  p^.  In  this  case  R^  is  achievable  if  and  only  if 

Rx  s  supth^TT)  :  TT  €  A1(p1)}  (3) 

where  A^(p^)  =  £tt  €  A  :  tt^  *  p^}  as  before,  and  h^(rr)  is  the  average 
conditional  entropy  given  by 


h1(TT) 


A 


-2  £  tt (u , v)  log 
u  V 


nfa.v) 

2  tt(u,v) 
u 


(4) 


10 


(b) 

Figure  2.  Slepian-Wolf  problem,  (a)  Encoder-decoder  configuration  for 
corner  point,  (b)  Slepian-Wolf  rate  region  for  a  source  p. 
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Notice  that  for  any  distribution  p  on  x 

h^(p)  *  K(p)  -  ZI  p(u,v)log  p2(v) 

L  U  V 

*  K(p)  -  K(p2)  .  (5) 

3.2  Positive  Coding  Theorem 
Define 

hjCPj.)  *  sup £h^ (tt)  :  tt  €  A1(p1>}  (6) 


The  proof  of  the  positive  coding  theorem  follows  directly  from  that  of 
Berger  [5,  Theorem  3.2],  simply  substituting  h^p^)  for  h^n)  and  fixing 
R>2  >K(P2)-  This  proof  with  the  necessary  modifications  is  as  follows. 

Form  N  =  exp^  [n(h^(p^)  4-  2y)]  sets  of  x^  sequences,  say 
by  selecting  sequences  independently  from  according  to  a  uniform 

distribution.  The  selections  are  made  with  replacement  so  that  every  x^^ 
sequence  is  equally  likely  to  be  selected  each  time.  This  is  done  until 
each  set  contains 


IS  |  =  exp2[nCiC(p1)  -  h^p^  -  Y)  1 


5 -typical  sequences. 

A  sequence  is  encoded  into  an  index  i(x^^)  defined  by 

j  if  j  »  min[k:x(1)  €  Sk} 

0  if  x<X>  $  S^,  k  *  j,...,N 

The  index  i(x^^)  is  then  sent  to  the  decoder,  requiring  a  rate 
+  An  error  occur  if  i(x^^)  =  0.  Since 


(7) 


(8) 
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P{X(1)  £  ^(6^)}  -  0  as  n  -  co  , 

we  will  know 

P{i(X^)  =  0}  -  0  as  n-» 

if 

Q  =  p{i(X^)  =  0|  X(1>  €  "*  0  as  n  " 

Now 

Q  =  TT  p{x(1)  £  S.  I  x(1)  €  T  (6,P- )} 
j-1  J  n 

N 

=  n  TT  p{x(1)  +  u  |  x(1>  €  T  (6  ,p, )} 

j=l  u€S .  n 

~  J 

=  P{X(1>  ^u|x(1>  €Tn(6,Pl)}NlS> 

-  U  -  I  Tn(6,Pl)|  }N,S| 

c.  (1  -  exp2[-n<K(p1)  +  c)])NlSl  (9) 

since  the  selections  were  done  independently  using  a  uniform  distribution. 
Since  ln(l-x)  £  -x,  0  £  x  <  1,  then 

In  Q  S  N)s|  ln{l  -  exp2  [  -n  (K  (p ^ )  +c)]} 

£  -n|s|  exp2  f-n <PX>  +  c)] 

=  -  exp2 [n (Y-c ) ] .  (10) 
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So  In  Q-  -®  and  p{i(X^)  -  0}  -»  0  as  n  -*  «  if  Y  >  c. 

Now  P{(X(1),X(2))  €  Jn(6,p)}  -•  1  as  n -•  a»  so  we  may  assume  that  the 

source  outputs  (x^\x^)  are  jointly  6-typical.  The  decoder  observes 

x  ^2  ^  and  knows  an  index  j  such  that  x^  ^  €  Sj.  The  decoding  procedure  is 

(2 ) 

to  search  S.  for  a  sequence  u  such  that  (u,xv  ')  €  Jn(6,p)-  As  an  upper 

^  (1) 

bound  assume  an  error  occurs  if  there  is  some  sequence  u£S.,  u^x 

such  that  (U,x(2))  €  Jn(6,p).  Let  E(x(2))  denote  this  event  for  a  given 

x^2\  Then 

P[E(x(2))]  *  (j S|  -1)  P{U  €^(x(2))} 

where 

*K(X(2))  -  {u  €  Tn(6,P1)  :  (u,x(2))  €  Jn(6,p)} 

and  U  is  a  random  vector  uniformly  distributed  on  Tn(6,p^).  The  proba¬ 
bility  of  error  associated  with  this  event  is  the  expectation  of 
P[E(x^)]  over  all  6-typical  sequences.  Denoting  this  probability 

of  error  by  P[E]  we  have 

P[E]  <  |s|  Z  P{U  6  "^(x(2))}P{X(2)  =  x(2>} 

(2) 

X 

S  |  s|  exp[-n^C(p9)  -  c  )]  Z  P{U  €  'U(x'‘  ^1* 

2  x(2> 

S  |s|exp[-n$C(p,)  -c)]  Z  |V(x(2))||T  (fi.p^r1  dD 

1  (2) 


But 


Z  |t«x(  }>|  -  |j(6,p)| 

„(2) 


P[E]  *  |s|exp[-n^C(p2)  -  c)]|  Jn(6,p)|  |Tn(6,p1)| 
<  |  S|  exp2  [n{K(p)  -JC(p^)  -JCCp^)  +  3c)] 


using  the  bounds  on  cardinality  from  Property  2  of  the  sets  of  typical 
sequences*  The  constant  c  is  defined  in  the  Preliminaries.  Since 


s|  =  exp2[n(fC(p1) -  Y)J» 


then  it  follows  from  (5),  (12),  and  (13)  that 


P[E]  <  exp2(n(h1(p)  -  h^p^  +  3C  '  Y)1 


Since  (6)  implies  h^(p^)  £  h^(p),  it  follows  from  (14)  that  P[E]  -*  0  as 


n  *  if 


Y  >  3c 


The  constant  c  may  be  made  arbitrarily  small  and  does  not  depend  on  the 
distribution  p,  so  any  >  h^(p^)  is  achievable. 


3 . 3  Converse  to  Coding  Theorem 


The  rate  of  any  code  for  source  1  can  be  expressed  as 

R1  =  W  “  2  pU(1)  *u}jl(u)  =  n'1  zi  n  Pl(u  )Ji(u) 
u  u  i-1 


■ 
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where  A(u)  is  the  length  (i.e*,  number  of  bits  or  symbols)  of  the  code¬ 
word  for  u*  This  length  function  depends  only  on  the  code,  so  for  a 
given  code  the  rate  is  a  function  of  the  marginal  distribution  p^.  All 
sources  tt  ^A^(p^)  have  the  same  marginal  p^,  hence  any  code  has  a  single 
rate  =  R^(p^)  for  all  of  them.  By  the  converse  of  the  Slepian-Wolf 
theorem  applied  to  a  particular  source  tt  €  A^(p^) 

*  hl(TT)’  (17) 

and  since  this  applies  to  all  sources  inA^(p^), 

R1  =  Ri<pi>  *  :  tt  €  A  L  (P1)3  =  <PL)  (18> 

which  is  the  desired  result. 


CHAPTER  4 


ROBUST  CODING  FOR  THE  SIDE  INFORMATION  PROBLEM 

4.1  Statement  of  Problem  and  Result 

In  this  problem  the  output  of  source  1  must  be  reproduced  with 
arbitrarily  low  distortion  by  the  decoder.  Encoder  2  sends  information 
derived  from  source  2  at  rate  R^  and  this  information  is  used  to  determine 
the  but  reproduction  of  the  output  of  source  2  is  not  required 

(see  Fig,  3).  The  solution  to  this  problem  is  given  in  terms  of  an 
auxiliary  random  variable  Z.  Let  the  joint  distribution  of  and 

Z  be 

q(u, v,w)  «  P{X(1)  =  u,X(2)  =  v,Z  =  w) 
and  the  joint  distribution  of  X^  and  Z  be 
q^u.v)  =  P{X(1)  =  u,Z  =  v} 

for  i  »  1,2,  Also,  define  p^  to  be  the  marginal  distribution  of  Z.  In  the 
case  where  the  joint  distribution  p  of  X^  and  X^  is  known  precisely  by 
the  encoders  and  the  decoder  the  rate  region  is  ([2], [3]) 

ft  =  {  (R^.R^)  :  Z  X(1)  -  X(2)  -  Z  and  ^  2  h^q^,  R2  *  J(q2)}.  (19) 

Here  X^^  -*  X^2^  -*  Z  indicates  that  these  random  variables  in  the  order 
listed  form  a  Markov  chain  (i.e,  X^ 
given  X(2>), 


and  Z  are  conditionally  independent 
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I 


f 


=  ^(q.^  -j<:(pz) 


(20) 


is  the  conditional  entropy  of  X  given  Z,  and 


^(q,)  *  -  I  q-(u,v)  log 


u.v 


q2 (u»v) 

P2 (u)Pz(v) 


=  JC(p2)  -hz(q2) 

is  the  mutual  information  between  Xv  and  Z.  In  all  that  follows,  we 
assume  only  that  p  €A>  that  the  i-th  encoder  knows  the  i-th  marginal  p., 
and  that  the  decoder  knows  p. 

Since  X(1)  -  X(2)  -  Z,  then  q(x(1)  ,x(2)  ,z)  =  p(x(1)  ,x(2))w(z|x(2)) 
and  so  the  auxiliary  random  variable  Z  may  be  specified  by  a  conditional 

distribution  w  where  w(ujv)  =  P[z  =  u|x^  =  v}.  This  conditional  distri¬ 
bution  is  chosen  beforehand  and  is  known  to  the  encoders  and  the  decoder. 
Notice  that  q^  is  obtained  from  p  and  w  by 


.(1) 


(D  „(2), 


q,<xw,*)  =  I  P(x^',x^')w(z|x(2)). 

.(2) 


(21) 


Under  these  assumptions  the  rate  region  is 


a  =  { (R1,R2)  :S  w  3  Rt  *  h2(Pl),  R2  a  J(q2)} 


(22) 


where 


hz(p1)  *  sup{hz(q1)  :  tt  €  A1(p1)] 


(23) 


^1(P1)  *  {tt  €  A  :  =  p^. 


and 
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Note  that  ^2(P^)  *-s  defined  for  a  fixed  w,  and  so  is  a  function  of  rr 
as  indicated  in  (21)  with  p  replaced  by  rr. 

4.2  Positive  Coding  Theorem 

The  proof  here  is  that  of  Berger  ([5],  Theorem  5.1)  with  a  few  modi¬ 
fications,  as  in  the  positive  coding  theorem  for  the  Slepian-Wolf  problem 
in  section  3.2.  Encoder  2  knows  p^  and  w  so  it  may  encode  exactly  as  if 
the  distributions  were  known.  This  is  done  as  follows.  Let  D  be  a  sub¬ 
set  of  T  (6  »p  )  of  size 
n  rz' 

|D|  =  exp2[n(JJ(q2)+f(6))],  (24) 

for  which  the  sequences  jz  are  chosen  from  Tn(6>Pz)  according  to  a  uniform 

distribution.  In  (24),  f  (6 )  does  not  depend  on  p  since  p(u)^c*>0  and 

(2) 

f  (5 )  -  0  as  6  0.  It  can  be  shown  (Lemma  2.1.3  of  [5])  that  if  x  is 

any  6 -typical  output  sequence  from  source  2,  there  will  be  at  least  one 

(2 ) 

2  €  D  such  that  x  '  and  z  are  jointly  6-typical  w.p.  -*  1  (w.p.  -*  a  will 

be  used  to  indicate  with  probability  -*  a  as  n  <») .  Let  D  be  such  a  set. 

(2 ) 

If  the  source  output  is  r  ,  encoder  2  simply  sends  the  smallest  index 

which  corresponds  to  a  z  which  is  in  D  and  has  the  property  that 
(2 ) 

(x v  ,z)  €  an<*  senc*s  inc*ex  0  if  there  is  no  such  z_ .  Then  we 

have 

R2  =  J>(q2)  +  f(6)  (25) 

where  f(S)  "*  0  as  6  —  0. 

Encoder  1  first  determines  ^z(Pp  using  A,  p^,  and  w  (see  (23)). 


Next 
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exp[n(h2(p1)  +  2y)  ] 

il> 


(26) 


sets  of  x  ^  sequences,  say  S^,  . ..,  S^,  are  formed  by  selecting  sequences 
independently  from  Tn(&>P±)  according  to  a  uniform  distribution,  as  in 
section  3.2,  Here  each  set  contains 


|S|  *  exp2[n^C(p1)  -  ^z(p1)  -  Yl  (27) 

6-typical  sequences.  The  index  i(x^)  is  defined  by 

i(x^)  =  j  if  j  =  min{k  :  x^  6  Sk] 

0  if  X<1>  f!  Sk,  k  =  1 . N 

and  this  index  is  sent  to  the  decoder  requiring  a  rate  R  =  h  (p  )  +  2y. 

i  Z  JL 

With  N  and  |  s|  as  defined  in  (26)  and  (27),  the  derivation  of  equations 
(9)  and  (10)  in  section  3.2  holds,  showing  that  p{i(X^)  -  0]  0  as 


.Since  P{(X^,X^)  €  Jn(6,p)}  1  as  n  -•  oo  we  may  assume  that  the 

source  outputs  x^  and  x^  are  jointly  typical,  i.e.,  (x^,x^)  € 

(2) 

Jn(6,p)*  The  decoder  observes  a  sequence  z  jointly  6-typical  with  x 
and  knows  an  index  j  such  that  x^^  €  S  ^ .  Lemma  4.1  of  [5]  (’’Markov 
Lenina")  states 


,(D  ^(2) 


Z  and  (X(2),z)  €  Jn(6,q2)  =»  (XU;,z)  €  Jfl(6|*1|  ,q1> 


(1) 


w.p.  “*  1.  The  decoding  procedure  is  to  search  for  a  sequence  u  such 

that  (u,£)  €  where  6^  =  6|%J  .  The  set  contains  such  a  u 

w.p.  -*  1  by  the  above  lemma.  Let  E  be  the  event  that  at  least  one  of 

z 

the  |  s|  -  1  sequences  u  €  other  than  the  sequence  x^^  (the  actual 
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source  output)  is  6, -typical  with  z.  If  an  error  occurs  then  either  E 
occurs  or  else  there  is  no  u  €  such  that  (u,£)  €  •  Th e 

latter  event  has  probability  0  asymptotically  as  we  mentioned  above,  and 
the  event  E  has  probability  upper  bounded  by 

P(E  )  *  (|  S|  -  1)P(U  €  K  ).  (28) 

z  z 

where  U  is  a  random  vector  which  is  uniform  on  Tn(6,p^)  arK* 

*  {u  :  (u ,z)  6  Jn(61,q1)3* 

Since  U  is  uniform  on  1^(6  ,p^)  we  have 

P{U  euj  =  |\||Tn(6Jp1)r1.  (29) 

By  a  slight  modification  of  Lemma  2.1.2  of  [5]  (reversing  signs  in  expres¬ 
sions  of  the  form  p(u)  +  b\v]  ^  and  p(u,v)  +  6 ] or  by  Lemma  2.1.6 
of  [11], 

|tyj  *  exp2(n[hz(qi)  +  f '  (6 L) ]  3  -  (30) 

where  f  *  (6 ^ ^  0  as  5^  -*  0  and  ff  (5^)  is  not  a  function  of  p.  From 
property  2  of  the  6 -typical  sets 

| Tn C6  » p i ) |  1  £  exp2[-n(K(p1)  -  c)] ,  (31) 

so  P(u  €  Kz)  £  exp2[n(h2(qi)  -K(Pl)  +  f'(81)  +  c)  ] .  (32) 

Substituting  (32)  into  (28)  we  get 

P[EzJ  <  |  s|  exp2  [n (hz  (qi)  -3C(p1)  +  f '  (6^  +  c)] 
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and  using  (27) 

P[Ez]  <  exp2[n(hz(qi) -hz(Pl)  +  f  *  <6X>  +  c  -  Y)]  (33) 

£  exp2  [n(f 1  (6l)  +  c  -  y)]  (34) 

where  (34)  follows  from  the  definition  of  ^Z(P^)*  So  P[Ez]  -*  0  as  n  -*  ® 
if  y  >  c  +  f1^^)  for  any  z  €  D.  The  constants  c  and  f'(6^)  may  be  made 
arbitrarily  small  by  choice  of  6,  hence  any  (p^ )  is  achievable  and 

the  rate  region  is  as  given  by  (22), 

4.3  Converse  to  Coding  Theorem 

The  converse  to  this  result  is  exactly  the  same  as  for  the  result  of 
Chapter  3.  Only  a  single  rate  is  possible  for  all  tt  €  A^(p^)  and  ^ 
hz(q^)  for  each  tt  (and  its  associated  q^)  in  the  class  by  the  converse  to 
the  side  information  problem  hence  R-  ^  sup{h  (q.  )  :  tt  €  A  (p  )}  =  h  (p  ) . 

1  Z  JL  1_  JL  Z  1 
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CHAPTER  5 

THE  WYNER-ZIV  PROBLEM 

5.1  Introduction 

Encoder  2  sends  at  rate  R^  the  entropy  of  source  2,  so  the 

decoder  may  be  assumed  to  have  {x^  '}  exactly;  therefore  only  the  coding 
for  source  1  need  be  considered*  It  is  desired  to  determine  the  rate 
distortion  function  (D)  for  source  1;  that  is,  the  minimum  rate  such 
that  the  decoder  can  produce  an  r.v.  X  which  satisfies  E[d(X  ,X 
D.  The  distortion  measure  is  assumed  to  be  a  finite  single  letter  dis¬ 
tortion  measure  on  X  which  satisfies  d(u,u)  =  0  and  d(u,v)  >  0  for 
u  ^  v  (see  Fig.  4).  Again  an  auxiliary  random  variable  Z  is  required, 
so  let  q  be  the  joint  distribution  of  X^\x^\  and  Z  and  let  qi  be  the 
joint  distribution  of  X^,Z  induced  by  q. 

If  the  joint  distribution  of  X^  and  X^  is  known  precisely  then 
the  rate  distortion  function  is  [4] , 

R*(D)  -  inf{j?(qi)  -J(q2):  q€  Q(D))  (35) 

where 

Q(D)={q  :  X(2)“»  X(1)~  Z  and  3  f  :  %2  X  p-  9  e{  d  (X( 1)  ,  f  (X(2) ,  z)  )  }  £  d} 

(36) 

and  JJ(q^)  is  the  mutual  information  between  and  Z  computed  using  the 

distribution  q^.  Now  any  auxiliary  random  variable  Z  with  joint  distri¬ 
bution  q  €  Q(D)  may  be  described  by  a  conditional  distribution  w  where 
w(u|v)  »  P{Z  =  u|x^*v}  since  q(x^\x^\z)  =  p  (x  ^  ,x^  )w  (z|  x  ^  )  by 


Figure  4.  Configuration  for  the  Wyner-Ziv  problem. 


x  > 
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the  Markov  property.  So  an  alternative  description  of  the  rate  distortion 
function  is 

R*(D)  =  -J(q2>  :  w€P(p,D)]  (37) 

where 

P(p,D)  =  (w  :  Z  F(v,z)  <  D}  (38) 

v,z 

is  defined  in  terms  of  the  function  F  which  is  given  by 

F(v,z)  =  min{£  w(z | u)p (u, v)d (u,u)  :  u€%^}  (39) 

These  descriptions  are  equivalent  for  if  w  €  P(p,D)  then  define  the 
function  f  (v,z)  =  u*  where  u*  is  such  that 

£  P{x^  =  u  |  x^2)  =  v,  Z  =  z}  d  (u,u*)  =  min{  £  p{x^L)  =  u  |  X(2^  =  v,  Z  =  z} 
u  u 

d(u,u)  :  u  6  Zj  (4°) 

and  this  f  will  satisfy  E{d(X*'^\f  (X^2^,Z))}  s  D  by  the  definition  of 
P(p,D).  Conversely  if  q  €  Q(D)  then  the  corresponding  w  €  P(p,D)  because 
f  as  defined  by  (39)  minimizes  the  contribution  of  each  (v,z)  pair  (v^X^, 
z  €  P)  to  the  distortion,  which  minimizes  the  total  distortion* 

Now  assume  that  the  encoder  knows  only  that  p  €  A  and  define 

W (D)  -  n  P(n,D)  (41) 

TT  €  Al(pl) 


where 
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Al^pl^  *  ^  €  A  1  nl  “  p])* 

Then  the  set 

S  *  {  (R^D)  :  Rl  ^  inf{  sup{J(q1)  -J( q2>  :  rrSAj^p^}  :  w€  W(D)}  (43) 

is  achievable.  This  is  clear  since  by  the  definition  of  W(D)  the  decoder 
can  find  an  r.v.  with  {d(X^,X^)}  £  D  once  it  has  Z,  and  the 

random  coding  proof  of  the  Wyner-Ziv  result  in  [5]  shows  that  any  ^ 
J(q^)  -  J (q^ )  allows  Z  to  be  decoded.  For  a  given  w,  £  sup{j(q^)  - 
J?^)  :  n  ^  A(p^)}  so  Z  may  be  recovered  for  any  rr  6  A^(p^)  and  ft  is 
achievable.  However,  ft  does  not  necessarily  contain  any  point  on  any  of 
the  R  (D)  curves  for  individual  tt  €  A^Cp^).  For  any  given  w,  the  distri¬ 
bution  rr  which  achieves  the  maximum  rate  need  not  also  have  the  worst 
distortion.  So  the  above  result  does  not  establish  the  existence  of 
robust  codes. 

The  following  example  shows  that  in  fact  robust  codes  do  not  in 
general  exist  for  the  Wyner-Ziv  problem.  That  is,  in  general  it  is  not 
possible  to  construct  a  code  for  any  class  of  sources  which  achieves  a 
point  (R,D)  on  the  rate-distortion  function  R  (D)  for  one  of  the  sources 
and  distortion  no  greater  than  D  for  the  other  sources. 

5.2  Counter  Example  for  Robust  Coding 

Here  \Z^\  =  2  (i  =  1,2).  The  class  A  is  composed  of  two  pairs  of 
sources.  One  has  distribution  n  given  by  tt(0,0)  =  n(l,l)  =  *2(1 -B)  and 
tt(1,0)  =  tt(0,1)  =  k$s  •  The  other  has  distribution  tt  which  is  given  by 
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tt(0,0)  =  %(1-P  ),  tt(0,1)  =  HZ  ,  rf(l,0)  =  0  and  ff(l,l)  =  In  these 
z  z 

definitions  g  and  g  are  constants  in  [0,1].  The  distribution  tt 
s  z 

corresponds  to  a  doubly  symmetric  binary  source  (DSBS)  and  n  is  defined 
by  a  "Z-channel"  between  and  a  "Z-channel'1  being  one  with  only 

a  single  possible  crossover  (Fig.  5).  Note  that  rr^  «  tt^  so 
A  -  A1(tt1)  =  A1(tt1). 

Two  different  conditional  distributions  of  Z  given  X^^  are  necessary 
and  these  are  denoted  by  w  and  w.  The  distributions  tt,  tt,  w,  and  w  are 
used  to  define  three  joint  distributions  on  and  Z  as  below 

q(u,v,z)  =  tt(u,v)w(z  jv)  (44a) 

q'(u,v,z)  -  tt(u,  v)w(z  |  v)  (44b) 

qM  (u ,  v ,  z  )  =  tt(u,v)w(z|v)  (44c) 

for  u  €  Z,  v  €  and  z  6  p.  Joint  distributions  induced  on  and  Z 

by  these  dis tributions  are  denoted  by  q^,q.!,  and  qV  respectively.  For 
the  conditional  distribution  w  and  source  tt,  if  q(u,v,z)  =  tt(u , v)w(z | v) 
we  may  define 

r(TT,w)  ^  ^(q1)  -^(q2)  (45) 

and 

6  (tt,w)  -  Z  mintL  q (u,v,z)d(u,u)  :u  €  X^]  .  (46) 

v,z  U 

So  r(rr,w)  and  6(tt,w)  are  the  rate  and  distortion  achieved  by  a  source  tt 
and  conditional  distribution  w. 

ie 

To  achieve  a  particular  point  on  the  rate  distortion  curve  R  (D) 
for  rr  (the  DSBS)  the  conditional  distribution  of  Z  given  X^  must 
correspond  to  a  specific  binary  symmetric  channel  (BSC)  with  a  fixed 
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level  of  time  sharing.  Any  other  distribution  will  do  strictly  worse. 

* 

Examination  of  the  derivation  of  R  (D)  for  the  DSBS  in  [4]  makes  this 
clear  (see  Appendix  B). 

* 

Now  fix  (3g  =  .1  (which  defines  n)  and  solve  for  d  ,  the  distortion 
at  which  time  sharing  begins  for  source  rr.  Solving  equation  (26)  of  [4] 
numerically,  we  find  d  =  .00752  and  R^(d  )  *  0.4238.  (R^D)  is  the 

Wyner-Ziv  rate  distortion  function  for  source  rr.)  Then  let  the  test 

«&• 

channel  w  be  a  BSC  with  crossover  probability  d  so  that  6(tt,w)  =»  d  , 

the  auxiliary  r.v.  Z  must  be  defined  by  this  test  channel  and  some  fixed 

level  of  time  sharing.  The  distortion  at  rate  equal  to  zero  is  Pg  =  .1 

for  tt,  so  R  (D)  for  D  ^  d  is  as  in  Fig.  5.  Next  find  a  value  p  (which 
tt  s 

defines  tt)  such  that 

r(TT,w)  =  r(rr,w)  .  (47) 

The  value  of  satisfying  this  constraint  is  .256,  and  6(tt,w)  =  .00472  is 
the  distortion  which  results  when  test  channel  w  is  used  with  source  tt. 

The  distortion  for  TT  at  rate  equal  to  zero  is  ■  .128  so  the  time 
sharing  performance  of  w  and  rr  is  as  shown  in  Fig.  5. 

Next  we  wish  to  find  a  test  channel  w  which  yields  lower  distortion 
at  the  same  rate  for  source  tt.  If  we  define  w(l)0)  =  P{z  =  l|x^  =  0}  =  .01465 
and  w(0|l)  »  .00465  then  6  (tt,w)  =  .0042  <  6  (tt,w)  and  r(TT,w)  =  r(ir,w)  as 
desired.  Note  that  the  pair  [r(TT,w)  ,6  (tt,w) ]  is  not  necessarily  on  the 
rate  distortion  curve  for  tt,  but  it  is  an  upper  bound;  that  is, 

r![6(tt,w)]  <  r  (*,£)  . 

TT 


(48) 
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Referring  to  Fig.  5,  we  let  cT  denote  the  distortion  and  r^  denote 
the  rate  of  the  point  corresponding  to  the  intersection  of  the  (w  -  tt) 
curve  and  the  (w  -  tt)  curve*  Test  channel  w  applied  to  source  tt  achieves 

'A 

distortion  d  >  d.  at  rate  r. . 

l  i 

“  A  r  A 

Let  D  =  max{d(u,v)  :  u  €  %pV  €  and  define  a  conditional 

distribution  distance  dg(w! ,wM)  by 

d  (w1  ,wM)  =  £  |w* (u| v)  -  w" (u| v) |  .  (49) 

S  u,v 

A  _ 

Pick  e  >  0  such  that  d  -  d^  >  cD  >  0.  If  a  new  auxiliary  r.v.  Z  has  w* 

as  its  conditional  distribution  given  and  w*  is  such  that  ds(w‘,w)>e, 

then  this  r.v.  will  have  distortion  strictly  greater  than  d_^  (at  rate  r^) 

when  used  for  source  TT  (the  DSBS)  by  the  proof  in  Appendix  A.  Yet  if 

d  (w',w)  <  e  then  the  distortion  achieved  for  source  tt  is 
s 

2  minp  w*  (z  |u)Ti(u,x)d(u,  v)  :v€%-}  ^  d-eD>d,  .  (50) 

v.z  u  1  i 


In  either  case  the  distortion  for  one  of  the  two  sources  {tt,tt}  exceeds 


d.  ,  hence  no  single  r.v.  can  achieve  distortion^  d.  for  both  sources  at 

i  l 

rate  r^ .  This  does  imply  that  no  robust  code  exists,  as  can  be  seen  from 

the  converse  to  the  Wyner-Ziv  result  (Section  III  of  [4]).  If  a  sequence 

of  codes  of  increasing  block  length  n  exists  which  achieves  asymptotically 

a  distortion  d^  at  rate  r^  for  both  tt  and  tt  then  eq  .  (55)  of  [4]  holds 

and  so  here  exists  a  sequence  of  random  variables  {z^^  :  1  <  j  <  n} 

which  satisfy  zfn^  -*  X —  Xp^  and  achieve  distortions  A^  such  that 
J  J  J  J 


. .  -1  £  A(n) 

lim  n  £  A.  -  d.  . 

n  -  «  j=l  J 


(51) 
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Let  be  the  conditional  distribution  corresponding  to  zfn^.  By  the 

J  J 


proof  in  Appendix  B,  the  point  (r£»di)  **s  achieved  for  source  n  by  a 
unique  conditional  distribution  w  and  time-sharing  parameter  6  €  [0,1], 


» 


Hence  the  sequence  of  conditional  distributions  must  approach  this 


conditional  distribution  w  with  a  particular  level  of  time  sharing. 
But  this  possibility  has  been  eliminated,  because  w  with  time  sharing 


achieves  a  distortion  d  >  d^  for  source  tt.  Hence  no  robust  code  exists, 


5 .3  Special  Case  where  Robust  Coding  is  Possible 

If  the  problem  is  restricted  to  sets  A  with  the  property  that  there 
is  one  worst  source  tt*  such  that  rr*(u,v)  =  E  n(u,y)  c^vly)  for  all  tt  €  A, 


where  c^  is  a  conditional  distribution,  then  a  robust  coding  technique 


similar  to  that  of  the  side  information  case  may  be  obtained.  The  coding 


method  in  this  case  is  simply  to  code  for  the  worst  source  rr*.  Let  q  be 


a  joint  distribution  of  and  Z  such  that  the  distribution  of 


X(1)  and  X(2)  induced  by  q  is  tt*.  Then  the  (R,D)  pairs  achieved  are 


given  by 


R(D)  *  inf {J(q1)  -  J(q2)  w  €  W(D)} 


(52) 


A  * 

where  q(u,v,z)  =  rr  (u,v)w(z|u)  and 


n 


W(D)  =  {w  :  E  min[E  q(u, v,z)d(u,u)  :  u  €  Z^]  <  d3  • 
v,z  u 


(53) 


Now  for  any  tt  €  A  there  exists  a  such  that  E  tt(u, y)c  (v|  y)  =  tt*(u,v) 
so  ^(q2)  ^  J(q2)  and  ^(q^)  -  J(q2)  *  J(q^)  -  J(q2)  for  fixed  w  (note 
that  «0(q^)  ®  J(q^)).  The  r.v.  Z  defined  by  w  may  be  recovered  if 
R  £  J(q  )  -  J(q2)  so  the  decoder  may  recover  Z  regardless  which  tt  6  A 


is  the  true  distribution. 
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The  decoder  can  derive  a  function  f :  X  such  that 

E{d(X(1),f(X(2),Z))3  <  D  as  follows.  Given  tt*  and  tt,  a  can  be  derived 

satisfying  E  tt (u,y)c^ (v|y)  =  tt*(u,v).  Then,  using  an  r.v.  may  be 

(2)  ~(2)  (1) 

derived  from  X  such  that  the  joint  distribution  of  X  and  X  is  tt*. 

Then  the  decoding  function  for  source  tt*  will  yield  the  desired  result. 

Since  source  rr  is  better  than  tt*  in  some  sense,  this  technique  may  not 

yield  the  minimum  distortion,  and  the  actual  distortion  achieved  may  be 

less  than  that  of  the  worst  source. 

The  performance  of  the  robust  coding  can  be  given  for  the  class 

A  =  [dSBS(0):0  €  Hamming  distortion  measure  O<0^02<^.  The 

* 

source  DSBS(02)  is  the  worst  source  tt  ,  and  its  rate  distortion  function  may 
be  achieved  by  the  coding  technique  of  part  Ila  of  [4].  If  the  source  in 
effect  is  tt  then  the  R  .(D)  curve  is  achieved.  This  is  the  solid 

TT* 

line  in  Fig.  6.  For  the  sources  DSBS(0)  with  0  <  there  are  two  cases. 

If  0  >  d*,  the  distortion  at  which  time  sharing  begins,  then  the 
performance  is  the  same  as  for  DSBS(02)  up  to  d*,  and  for  D  ^  d*  the 
performance  is  better,  as  the  distortion  at  rate  zero  is  0  (dotted 
curve  in  *‘ig.  6).  The  maximum  distortion  for  any  source  DSBS(0)  is  G, 
so  if  0  <  d*  then  the  performance  is  given  by  the  dashed  curve  in  Fig.  6. 
Similar  performance  is  achieved  in  the  case  of  the  doubly  symmetric  M-ary 

‘A’ 

source,  where  tight  bounds  on  R  (D)  are  known  [6],  as  this  is  also  a 
totally  ordered  set. 

5.4  Note  on  Evaluation  of  the  Wyner-Ziv  Ra^e  Region 

One  of  the  difficulties  in  coding  for  t*  Wyner-Ziv  problem  is  that 
explicit  formulas  for  the  rate  distortion  function  are  known  only  for 
doubly  symmetric  sources.  Even  numerical  evaluation  of  the  Wyner-Ziv 
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rate  region  is  difficult  as  the  constraint 

I  min{l  n(u,v)w(z  (u)d(u,u) ;  u  €  X <  D  (54) 

v,z  u 

is  not  in  general  convex.  For  example  suppose  the  source  is  a  DSBS(.l) 
and  consider  conditional  distributions  w^,  defined  by  w^(0|0) 

w1(0|0)  =  w2(l|l)  =  .5,  W]L(l|)  =  w2(0t0)  =  1,  and 

w3(u|v)  =  ^[Wj^Culv)  +w2(ujv)]  . 

Then  the  distortion  achieved  by  and  is  .075,  but  the  distortion 
achieved  by  w^  is  .10.  So  the  set  of  conditional  distributions  w  which 
satisfy  the  constraint  (54)  with  D  =  .075  is  not  a  convex  set. 
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CHAPTER  6 

ROBUST  CODING  FOR  THE  SLEPIAN-WOLF  PROBLEM 

6 . 1  Statement  of  Problem 

The  problem  here  is  the  same  as  that  of  Chapter  3  except  that  encoder 
2  is  not  constrained  to  use  R^  >3^(P2)*  The  outputs  of  both  sources  must 
still  be  decoded  with  arbitrarily  low  probability  of  error.  As  before 
the  joint  distribution  p  is  in  a  class  A,  and  the  i-th  encoder  knows  the 
i-th  marginal  p^,  and  the  decoder  knows  p.  The  set  A^(p^)  is  defined  by 

Ai(pi>  =  {tt€A:  TTt 

as  before,  and  we  define 

A  (p)  A  {tt  €  A :  tt1  =  Pi;  i  =  1,2}.  (55) 

6.2  An  Outer  Bound  on  the  Achievable  Rate  Region 

For  a  given  marginal  any  code  must  have  a  fixed  rate.  So  for  each 
subset  of  sources  A(p)  there  will  be  a  single  rate  pair  (R^jR^).  By  the 
converse  to  the  source  coding  theorem  R^  +  R^  ^  3C(tt)  for  every  ttGA.  So 
if  the  true  distribution  for  the  source  is  p  then  R^  +  R^  ^  suptJC(TT) :  tt€A(p)]. 
Defini ng 

(p^)  =  sup[hi(n):  n€Ai(pi)} 

we  also  know  s  h^p^)  where  h^(rr)  =  3C(tt)  -KC^)  and  (tt )  ^  K(rr)  -K(n^), 

by  the  robust  coding  result  for  the  corner  points  of  the  Slepian-Wolf 
region.  So  an  outer  bound  on  the  achievable  rate  region  is 

ft  =  R1  +  R2^suP[3C(tt):  rr € A(p) ] , 

R^supth^n):  n€At(pi)],  i  =  1,2}. 


(56) 
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Now  neither  encoder  can  determine  the  set  A(p),  so  the  i-th  encoder 
cannot  determine  its  rate  from  (56).  However,  since  the  decoder  knows 
the  joint  distribution  p,  any  subset  of  ft  for  which  the  i-th  encoder  may 
determine  is  achievable  by  the  random  coding  argument  used  in  Chapter  3. 
Note  that  the  set  ft  is  a  function  of  p,  so  we  actually  have  a  family  of  sets 
ft(p)  which  bound  the  achievable  rate  regions  for  each  source  p€A. 

6 .3  Some  Sets  of  Achievable  Rates 

Three  different  coding  techniques  which  yield  sets  of  achievable 
rates  are  considered  in  this  section.  In  the  first  technique  R^  =  k  and 
^  sup{K(rr)  -  k  :  rr  €  A2(P2)^>  where  k  is  a  constant  chosen  beforehand. 

If  the  source  distribution  is  tt  we  must  have  R^^h^(n)  so  k  must  satisfy 

k  £  sup{h1(n)  :  tt  €  A}  (57) 

Also  ^  h2(P2)  so  if  we  define 

ftkl  =  C  (RL,R2)  I  Rl  *  k,R2  a  max(h2(p2)JsuppC(n)-k  :  tt6A2(p2)])}  (58) 

where  k  satisfies  (57),  then  is  a  subset  of  ft  and  R^  is  a  function 
only  of  p^  so  i s  achievable.  Reversing  the  roles  of  the  encoders 

yields  another  set  • 

Another  set  of  achievable  rates  is  given  by  time  sharing  between  the 
corner  points  of  the  Slepian-Wolf  region  derived  in  Chapter  3.  Let 
0  €  [0,1]  be  the  time  sharing  parameter  selected  Keforehand  and  define 

^  ■  {  (W  :  ^  supfR^g.n)  :  rrSA^p^] , 

R2  a  sup[R2(l-8,n)  :  TT  €  A2  (p2 )  ]  } 


(59) 
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where 


Rj/P.TT)  -  ^(TT^  +  (l-g)h.(TT). 


The  i-th  encoder  can  clearly  determine  R.  so  we  need  only  show  ft  c  ft. 

i  P 

Note  A  (p)  c  A  .  (p . )  so  if  (R.,R„)  €  ft  then 

11  I  Z  p 


Rt  *  supt^CP.TT)  :  TT  €  A  (p)} 


Rl  +  R2  S  sup{K(ir)  :  Tt  €  A(p)}. 


Also  since  0  £  g  £  1 


Rl  s  sup{hi(TT)  :  tt  €  A ± (Pt)3 


and  so  ft  C  ft. 

P 

A  third  technique  is  to  choose  an  a  >  0  (a  will  represent  the  ratio 


R2/Ri) ,  and  define 


P(TT)  =  (  1 


h2(TT) 

3C0t^) 

K(tt2) 

h^) 


aY:  (px)  -  h2(n) 

(TTj^)  -h2(TT)  +lC(n2)  -ah1(TT) 


otherwise 
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»a  "  {  *  suplR^p^.n)  :  tt  €  A^p^] 

R2  *  sup[R2(l-8(TT),TT)  :  TT  €  A  2  (p2  ^  ^ 


(62) 


is  achievable  using  the  same  reasoning  as  for  ftQ*  Regions  derived  from 

P 

these  two  techniques  (for  all  possible  a  and  (3)  are  sketched  in  Figure  7. 

The  sets  ft  and  ft  may  be  improved  in  the  following  way*  If  one  of 
Qt  P 

the  encoders  can  determine  from  its  marginal  that  the  other  encoder  is 

using  a  rate  higher  than  necessary  it  may  reduce  its  own  rate*  Applied 

to  ftQ  this  yields  the  set 
P 

R81  °  ^  (RX>R2)  :  Ri  *  8up(R^O,TT)  -  sup{R2(l-3,n)  :  n  €  A2(tt2)} 

+  R2(1-P,tt)  :  tt  €  A  L  (p x )  ] , 

R2  a  sup  [R2  (1-8  ,tt)  ;  tt  €  A2(p2)]]  (63) 


Since  tt  €  A^C^)  it  is  clear  that  ft ^  is  at  least  as  good  as  ft^  . 

ft  c  ft  suppose  that  the  source  distribution  is  p.  Then 
P 1 


To  show 


Rt  *  Rj^O.p)  -  sup{R2(l-3,TT)  :  TT  6  A^)}  +  R2(l-3,p) 


and 


&2  a  sup{R2(l-p,TT)  ITT  €  A1(P1)} 

so  +  R2  ^K(p)  and  ft^^  c  ft.  The  same  modification  applied  to  ft^  yields 


pOC(tt):tt  €  A2(p2)} 


ip{h2(TT):TT  €  Aj^pp}  .\ 

supth^TT)  :n  €  A1(p1) 3  3C(Pl)  supCK(n)  :tt  €  Aj^Pj)) 
supChj^CTT)  :tt€A2(p2)] 


Figure  7 •  Sketch  of  regions  ft  and  ft  . 
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a  set  which  is  defined  as  ft^  with  0  replaced  by  0(p),  and  the 

same  reasoning  shows  that  it  is  achievable.  Similar  sets  ft  and  ft  may 

Ct2  p2 

be  defined  switching  the  roles  of  the  encoders. 

An  example  where  ft  ,  ^  ft  and  ft  .  ^  ftD  is  given  in  Figure  8.  Here  A 
cyi  ct  pi  p 

consists  of  two  sources  rr  and  tt  such  that  tt^  ■  but  tt^  ^  tt^.  In  Figure  8, 

rr  is  the  true  distribution  of  the  source.  For  this  example  ft  *ft  and 

^  a2  cr 

^02*^0  80  t*ie  l®porved  sets  are  not  uniformly  better,  nor  are  they  necessarily 

ic 

equal.  The  Slepian-Wolf  region  for  tt  is  denoted  ft  (tt)  in  the  figure. 

The  following  example  shows  that  even  the  union  of  all  of  these  sets 
is  not  the  achievable  rate  region.  Here  A  =  {ttjt'.tt"}  and  these  sources 
have  Slepian-Wolf  rate  regions  as  in  Figure  9.  Note  that  tt^  *  rr^H  ^  rr^1, 

^2  *  TT2*  ^  TT2"5  <^-CrT]/)>  andK(TT2)  <X(tt2h)*  The  various  sets 

A 

are  given  in  Figure  9  assuming  that  rr  is  in  effect.  A  better  set  ft  may 
be  derived  as  follows. 

If  tt^  is  the  observed  marginal  then  the  i-th  encoder  codes  as  for 
ft^  but  using  tt  in  place  of  A^(tt^).  So  if  tt  is  the  actual  source  then 
the  set  ft  (see  Figure  9)  is  achieved.  If  tt*  is  in  effect  encoder  1  can 
determine  this  and  use  rate 


Rx  -  R1®(tt,),tt’)  +  r2(0(tt'),tt')  -R2(0(tt),tt) 


*nd  the  tt*  rate  region  is  achieved.  If  ttm  is  in  effect  the  encoder  2  does 
similarly  and  the  rrH  rate  region  is  achieved.  So  even  the  union  of  these 


sets  of  achievable  rate  pairs  is  not  the  achievable  rate  region. 
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CHAPTER  7 
CONCLUSION 

We  have  considered  a  number  of  problems  in  source  coding  for  pairs 
of  correlated  discrete  memoryless  sources  for  the  situation  in  which  the 
distribution  of  the  source  outputs  is  not  precisely  known  by  the  encoders 
or  the  decoder.  We  have  assumed  only  that  the  source  is  in  some  class. 

For  problems  where  decoding  with  arbitrarily  low  distortion  is  required 
we  showed  that  universal  coding  (as  defined  for  single-user  problems)  is 
not  generally  possible.  We  defined  robust  codes  as  codes  which  achieve 
the  the  optimum  performance  for  one  source  in  the  class  and  achieve 
performance  which  is  no  worse  (i.e.,  no  larger  rate  or  distortion)  for  the 
other  sources  in  the  class.  For  the  side  information  problem  and  the 
corner  point  of  the  Slepian-Wolf  region  the  rate  for  one  of  the  encoders 
does  not  depend  on  the  class  of  possible  joint  distributions.  In  these  two 
problems  we  found  that  the  optimum  performance  was  achieved  by  robust 
coding  for  a  particular  subset  of  the  class  of  sources.  A  bound  was  given 
for  the  optimum  performance  for  the  Slepian-Wolf  problem,  but  the 
achievable  rate  region  was  not  determined. 

For  each  of  the  above  problems  the  converse  to  the  robust  coding 
theorem  was  provided  by  the  converse  to  the  corresponding  problem  in 
which  the  source  distribution  is  known.  That  is,  the  converse  is  derived 
from  the  fact  that  the  converse  for  some  individual  source  in  the  class 
(or  in  some  subset  of  the  class  as  determined  by  the  encoders)  requires 


n°v,  _\.m 
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the  rate  to  be  at  least  that  of  the  positive  coding  theorem  for  the  entire 
class*  This  technique  may  not  be  used  to  prove  the  converse  for  the 
Wyner-Ziv  problem  because  the  counterexample  of  Section  5.2  shows  that 
for  some  classes  of  sources  the  optimal  performance  is  strictly  worse 
than  that  for  all  individual  sources  in  the  class. 
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APPENDIX  A 

MODIFICATIONS  NECESSARY  FOR  THE  PROOFS  OF  CHAPTERS  3  AND  4 
WITHOUT  THE  ASSUMPTION  OF  KNOWN  MARGINALS 
Here  we  assume  that  the  i-th  encoder  has  no  apriori  knowledge  of  the 
i-th  marginal  and  that  the  decoder  has  no  apriori  information  on  the 
joint  distribution  p.  If  f  and  g  are  real-valued  functions  defined  on 
a  finite  set  X  then  we  define 

dm(f,g)  =  max{|  f(u)  -  g(u)|  :  u€Z}  .  (65) 

Let  represent  the  space  of  all  discrete  memory less  sources  with  finite 
alphabet  X ^  x  X^»  If  we  define 

J  -  |>€/7J:tt(u,v)  £  e'  for  all  u.vfE^  x^]  (66) 

then  for  any  p  g 

inf(dm(p,TT):  tt  €  <  2e'  .  (67) 

For  some  initial  time  (i.e.,  based  on  an  initial  block  of  source  output 
symbols)  the  i-th  source  estimates  the  marginal  p^  •  Call  this  estimate  tt^  • 
This  initial  time  may  be  chosen  such  that 

pCIp^Cu)  -tt^u)!  >  e"  for  some  u  €  X  3  <6*  .  (68) 

For  any  e",  61  >  0.  From  (67)  a  source  p  €  J  exists  such  that 
dm^i’Pi.)  <  2l^ile'  and  so  for  this  p  we  have 


p[|pi(u)  -  pa(u) |  >e  for  some  u  €2^}  <  6  ' 


(69) 
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by  the  triangle  inequality  where  €  »  6 "  +  2 1^| c  1  and  \Z\  ®  max{|%J  ; 
i*l,2}.  The  marginal  p^  is  then  used  to  encode  some  block  of  source  out¬ 
puts,  Assuming  that  the  entire  block  is  in  error  if  p^  is  not  within  € 
of  p^  for  all  u  6  Z^9  this  introduces  a  probability  of  error  less  than  26 1 
for  the  coding  scheme. 

The  decoder  makes  an  estimate  tt  of  the  joint  distribution  p  by 
using  rates  *K(p^),  (i*l,2)f  for  some  initial  time.  The  decoder  then 
finds  an  p  €  J  which  is  within  2s  1  of  tt.  This  initial  time  may  be  chosen 
such  that 


P{  |  P(u,v)  -  p(u,  v)|  >  s  for  some  (u, v)  €  Z^  X  X^}  <  6  '  (70) 

where  e  =  s  \Z\  *  and  61  and  S  are  defined  in  (69).  As  with  the  marginals 
we  now  assume  d^(p,p)  £  S  adding  a  block  probability  of  error  of  6 ' • 

The  encoders  may  obtain  sets  of  sequences  with  properties  similar  to 
those  of  the  Tn(6*P^)  using  these  estimated  marginal  distributions  as 
follows.  Let  the  i-th  encoder  define  u  to  be  typical  if 

In"1  N(u| u)  -  p^l  <  6|%i|"1  +  «  (71) 

for  all  u  €  Z^  where  p^  satisfies  |p^(u;  -  p ^ (u)  |  <  c.  Call  the  set  of 
such  sequences  Tn'(S,p^).  Then  from  (1)  and  (71) 

T„(6,Pi)CI„'(6,P1) 


s 


(72) 
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hence 


P[u  €  T  '  <6  ,p^)  ]  “•  1  as  n  -*  « 


for  any  e>5  >  0.  Also 


Tn '  (6»Pi)  c  Tn(6  +  26\Z\  ,Pi) 


exp2  [n  (30  (p^  -k)]  <  |Tn'(6,pi)|  <exp2[n(T0  (p.^  +k)] 


where 


k  =  -[6  +  2e\Z\] log  e ’ 

*  -[5|%.|  _1  +  2e]  t  log  p.  (u) 


Note  that  k  may  be  made  arbitrarily  small  by  choice  of  €  ,  6,  and  c1.  Here 

we  use  the  lower  bound  on  p^(u)  provided  by  (66). 

The  decoder  defines  the  set  J  1  (6,p)  of  jointly  typical  pairs  (u,v) 

n  —  — 

as  those  satisfying 


|n-i  N(u,v|u,v)  -  p(u,v)j  <  l\%2\  1  +e 


and  so 


exp2 [n( JC  (p)  -  k)]  <  Jn’  (6  ,p)  <  exp[n(3C  (p)  +k)] 
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since  from  (76) 


k  2  -[6|*ir1|*2r1  +  e  |%|  _1]  Z  log  P(u,v) 

u,v 


Finally  define  the  set 


(80) 


Ai' (Pf)  =  {tt  €  A  ;  mex  liTiCu)  -  <e].  (81) 

u 

In  coding  for  the  corner  point  of  the  Slepian-Wolf  region  (Chap.  3)  these 
sets  Tn*(6*p^)  and  A/(p^)  may  be  used  and  yield  the  same  results  in  the 
limit  as  €  0  (recall  A  is  assumed  closed).  Note  that  all  information 

theoretic  quantities  are  continuous  bounded  functions  of  the  probability 
distributions  (here  the  alphabets  are  finite)  so  the  use  of  the  estimate 
p  instead  of  p  to  compute  them  will  result  in  a  term  added  to  the  required 
rates  of  the  form  f  (c )  (where  f(e)  0  as  6  -*  0). 

The  proofs  of  the  three  Lemmas  used  in  Chapter  4  may  be  easily 
modified  to  hold  for  the  typical  sets  of  this  section  simply  by  using  (73) 
and  (75)  in  place  of  properties  1  and  2  of  Chapter  2.  The  functions  of 
5  in  (24)  and  (30)  become  functions  of  €  and  6  which  approach  zero  as  C 
and  6  approach  zero.  These  functions  remain  independent  of  p  because  of 
the  uniform  bound  of  (66). 
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APPENDIX  B 

PROOF  OF  THE  COUNTER  EXAMPLE  FOR  THE  WYNER-ZIV  PROBLEM 
Here  we  show  that  the  auxiliary  r.v.  Z  in  the  Wyner-Ziv  problem  must 
be  defined  as  the  output  of  a  BSC  with  X  as  input  in  order  for  Z  to  be 
optimal  for  the  DSBS .  The  notation  used  and  equations  referred  to  in  this 
appendix  are  from  [4]. 

Referring  to  equations  (42)  and  (43),  if  the  d^  are  not  identical 
then  we  have 

R  -  I (X;Z)  -  I(Y;Z)  ^  0  £  X  G(6  )  >9G(  £  X  d  ) 

X  2  €  A  2  2  2  €  A  2  Z 

since  G(d)  -  h(p^*d)  -h(d)  is  a  strictly  convex  function  of  d  for  fixed 
Pq  €  (0,^).  By  definition, 

dz  =  E [d (X,X) | Z  ^  z]  =  P[X  4  V(Z) |z  =  z} 

and  Y(z)  has  range  {0,l}.  If  the  d  are  identical  (for  all  z  6  A)  then 

z 

by  the  definition  of  Y(z),  A  may  be  divided  into  two  sets  A^  =  [z:Y(z)  =  i 3 , 
i  =  0,1,  such  that  all  z  6  A  are  equivalent  in  the  sense  that 

P(X  =  x,Y  =  y|z  =  zQ}  =  p{x  *  x,Y  =  yjz  =  z^}  for  all  x,  y, 
if  Zq  and  z^  €  A^*  So  for  the  z  €  A  the  channel  defining  Z  is  a  BSC  with 
parameter  d^.  The  set  A  may  be  regarded  as  time  sharing  since  from  (36) 

E [d (X,X) | Z  €  AC]  =  pQ, 

and  distortion  p^  may  be  achieved  with  *  0.  Furthermore,  the  last  line 
of  (40)  is  exactly  the  rate  required  by  time  sharing  between  the  BSC  with 
output  Y(z)  and  parameter  dz  and  no  channel  at  all.  For  equality  to  hold 
in  (40),  so  that  the  rate  for  Z  is  the  same  as  this  time  sharing,  we  must 


it**.  '  K*7JW*i*i^*  .t 
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have 

2  [H(Y|Z  =  z)  -  H(x|z  -  z)]  P{Z  *  z}  *  0 
z€AC 

which  implies 

P[Z  =  z|x  -  0]  =  P[Z  =  z | X  =  1] 

_  c 

for  2tA  .  So  for  a  r.v.  Z  to  achieve  the  lowest  possible  rate,  no 
information  about  X  can  be  given  by  the  event  Z  €  A  .  So  the  optimal 
channel  must  be  a  combination  of  a  BSC  which  is  used  with  some 
probability  9,  and  a  channel  which  gives  no  information  which  is  used  with 
probabi li ty  ( 1  -  9 ) . 

We  have  shown  that  a  BSC  with  some  level  of  time  sharing  must  be  used 
to  attain  optimal  performance.  It  is  easily  seen  that  the  parameter  of 
the  BSC  and  level  of  time  sharing  are  unique.  Up  to  the  point  where  time 
sharing  begins  no  two  BSCfs  will  achieve  the  same  rate,  since  G(d)  is 
strictly  decreasing.  G(d)  is  strictly  convex  so  in  the  time  sharing 
region  no  other  BSC  with  time  sharing  will  do  as  well  as  the  BSC  with  cross- 
over  probability  d  by  the  definition  of  d  . 
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